Department of Biostatistics, Johns Hopkins School of Public Health
For each second and each person:
Obtain joint distribution of acceleration and lag acceleration for a series of lags
Calculate scalar summaries of the joint distribution
I will walk through the process for one second, one person, and one lag
Intuition: walking is cyclic process. We want to leverage cyclic nature of walking.
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Toy example: 4 observations per second, 2 seconds, 1 individual
\(v_j(s)\): \(s^{th}\) acceleration observation in second \(j\)
data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]
Toy example: 4 observations per second, 2 seconds, 1 individual
\(v_j(s)\): \(s^{th}\) acceleration observation in second \(j\)
data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]
acceleration matrix \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \] lag acceleration matrix \[\begin{bmatrix} v_1(1) & v_1(1) & v_1(1) & v_1(2) & v_1(2) & v_1(3) \\ v_2(1) & v_2(1) & v_2(1) & v_2(2) & v_2(2) & v_2(3) \\ \end{bmatrix} \]
lag matrix \[\begin{bmatrix} 1 & 2 & 3 & 1 & 2 & 1\\ 1 & 2 & 3 & 1 & 2 & 1\\\end{bmatrix} \]
Number columns: \(4 \cdot (4-1) / 2 = 6\)
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
\(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)
\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lag length
Model outcomes as:
\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]
where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise
Model:
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]
\(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)
\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lag length
Fit using penalized splines with a quadratic penalty on the functional coefficient (Wood 2016)
\(\texttt{te()}\): tensor product smooth
\(\texttt{k = c(5, 5, 5)}\) number of basis functions for each dimension of the tensor product smooth
\(\texttt{weight\_mat}\): matrix of weights of linear functionals of smooth terms. We use equal weights so the \(i,j^{\mathrm{th}}\) entry is \(\texttt{1/ncol(accel\_mat)}\)
\(\texttt{method="REML"}\): smoothing parameter selection with restricted maximum likelihood
Rank-1 (rank-5) % accuracies
153 person dataset
3 min of walking seach
Two sessions at least 1 week apart
Rank-1 (rank-5) % accuracies
153 person dataset
3 min of walking seach
Two sessions at least 1 week apart
5 open-source algorithms, 3 datasets with gold-standard step counts
How many steps does the average American take per day?
Do estimates differ by algorithm?
Are more steps associated with lower mortality risk?
Do males take more steps than females? At what points during the day?
Outcome: steps profile over the course of the day (function)
Predictors: age, sex (scalars)
Model: \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{sex}_i + \beta_2(s)\mathrm{age}_i \] \(i\): participant; \(s \in \{1, \dots, 1440\}\): each minute of the day
Interpretation
Fast univariate inference (FUI) (Cui et al. 2021)
Fit separate (univariate) GLM at each point \(s\), smooth the resulting point estimates
Bootstrap subjects to get confidence bands
BUT: NHANES is not a simple random sample
Individuals are sampled in geographic clusters
Minority groups are oversampled
Are our estimates valid for population-level inference?
For standard regression, well developed methods and software exist that take into accounts weights and correlation between clusters (e.g. \(\texttt{svyglm}\), \(\texttt{svycoxph}\)) (Lumley 2010)
No such methods for function on scalar regression
FUI built on separate GLMS
Idea: incorporate survey weights into the GLMs and use survey-aware replication/bootstrap methods for inference
First ever simulation study to evaluate function on scalar regression in complex survey settings